Skip to content

track topics by product for use by AI-based classifier agent #6629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 15, 2025

Conversation

escattone
Copy link
Contributor

@escattone escattone commented Apr 14, 2025

This PR adds a YAML file that details the topic hierarchies for each product, including descriptions. It will be useful for building the prompt for AI agents that will automatically classify questions by topic.

It was generated by the script below (run in production), and then manually adjusted to remove a few remaining legacy topics, and add a few missing topics. Finally, it was manually verified against the "golden" list of topics defined in the Taxonomy Map PDF.

TODO: We should update this file with the more detailed descriptions that Josh is working-on once they are finalized.

from itertools import chain

import yaml

from kitsune.products.models import Product, Topic


def get_taxonomy(product_slug=None, **kwargs):

    def clean(text):
        return text.replace("\u2019", "'")

    result = dict(products=[])

    if product_slug:
        products = [Product.active.get(slug=product_slug)]
    else:
        products = chain(
            Product.active.filter(visible=True),
            Product.active.filter(slug="mozilla-account"),
        )

    for product in products:
        pdict = dict(
            title=product.title, description=clean(product.description), topics=[]
        )
        result["products"].append(pdict)
        for t1 in Topic.active.filter(
            products=product, parent=None, visible=True, **kwargs
        ):
            t1_dict = dict(
                title=t1.title, description=clean(t1.description), subtopics=[]
            )
            pdict["topics"].append(t1_dict)
            for t2 in Topic.active.filter(
                products=product, parent=t1, visible=True, **kwargs
            ):
                t2_dict = dict(
                    title=t2.title, description=clean(t2.description), subtopics=[]
                )
                t1_dict["subtopics"].append(t2_dict)
                for t3 in Topic.active.filter(
                    products=product, parent=t2, visible=True, **kwargs
                ):
                    t2_dict["subtopics"].append(
                        dict(title=t3.title, description=clean(t3.description))
                    )

    return print(yaml.dump(result, sort_keys=False))

@escattone escattone changed the title track topics by product for AI-based classifier track topics by product for use by AI-based classifier agent Apr 14, 2025
Copy link
Collaborator

@akatsoulas akatsoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the script too under the scripts folder and move the produced yml file either under docs or docs/taxonomy

@escattone escattone merged commit 86668ab into mozilla:main Apr 15, 2025
1 check was pending
@escattone escattone deleted the add-topic-hierarchies-per-product branch April 15, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants